Recidivism, or the tendency of a person previously convicted of a crime to reoffend, can be critical to understand in order to effectively target limited time and resources. Using a model to predict recidivism might help more fairly distribute resources to prevent justice-involved people from falling back into prison. Said model might just as easily be manipulated to tighten the disproportionate hold the parole and probation system has on certain communities.
This duality is highlighted by the decisions that go into a logistic regression, or logit, model which uses potential predictors to come up with a binary (yes/no) outcome. One of the most important considerations in building a logit model are the the tradeoffs between sensitivity and specificity:
options(scipen=10000000)
library(tidyverse)
library(kableExtra)
library(caret)
library(knitr)
library(pscl)
library(plotROC)
library(pROC)
library(lubridate)
library(tidycensus)
library(gt)
library(gtExtras)
library(sf)
library(janitor)
library(tigris)
library(RColorBrewer)
library(stargazer)
ga_recid <- read.csv("https://raw.githubusercontent.com/bwbrad/PPA_HW4_Logit/main/Data/NIJ_s_Recidivism_Challenge_Full_Dataset_20240409.csv",
na.strings = c(""," ","NA")) %>%
clean_names("snake") %>%
mutate(recidivism_within_3years = case_when(
recidivism_within_3years == "true" ~ 1,
recidivism_within_3years == "false" ~ 0),
recid_dv_char = case_when(
recidivism_within_3years == 1 ~ "Reoffended in 3 years",
recidivism_within_3years == 0 ~ "Did not reoffend in 3 years")) %>%
filter(!is.na(percent_days_employed))
recid_sum <- ga_recid %>%
group_by(recid_dv_char) %>%
summarize(people = n()) %>%
mutate(pct_people = people / sum(people))
This model is built using data from Georgia’s Office of Justice Programs on a subset of the state’s justice-involved population up through 2021. From a high level, 58% of people reoffend within 3 years. A greater share of reoffenses from the start means that whatever model we build will do a better job predicting those who do reoffend opposed to those who do not.
gt(recid_sum) %>%
fmt_number(columns = people, decimals = 0) %>%
fmt_percent(columns = pct_people, decimals = 1) %>%
cols_label(recid_dv_char = "Outcome",
people = "Number",
pct_people = "Percent") %>%
tab_header(title = md("**Overview of justice-involved population**")) %>%
gt_theme_nytimes() %>%
opt_row_striping()
| Overview of justice-involved population | ||
| Outcome | Number | Percent |
|---|---|---|
| Did not reoffend in 3 years | 10,469 | 41.3% |
| Reoffended in 3 years | 14,904 | 58.7% |
Below is a discussion on a variety of variables included in the Georgia dataset and their relationship to recidisim:
People who earned a HS diploma or less are much more represented in those which reoffended in under 3 years, suggesting level of education plays an important role in recidivism rates.
recid_by_education <- ga_recid %>%
group_by(education_level, recid_dv_char) %>%
summarize(justice_inv_pop = n())
ggplot() +
geom_col(data = recid_by_education,
aes(x = education_level, y = justice_inv_pop, fill = education_level),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char) +
theme_minimal() +
labs(title = "Recidivism by eductional attainment",
x = "Level of educational attainment",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
The median percent of days employed is much higher for those who did not reoffend. What jumps out particularly is the substantially greater number of reoffenses for people who were employed zero days. This suggests that stability of income and employment may be critical in keeping formerly convicted people out of the justice system.
ggplot() +
geom_histogram(data = ga_recid,
aes(x = percent_days_employed, fill = recid_dv_char)) +
scale_fill_manual(values = c("#1f78b4", "#ff7f00"),
guide = "none") +
geom_vline(data = ga_recid %>%
filter(recid_dv_char == "Reoffended in 3 years"),
aes(xintercept = median(percent_days_employed, na.rm = T),
colour = "Reoffended in 3 years"),
lwd = 0.8,
lty = "longdash",
show.legend = TRUE) +
geom_vline(data = ga_recid %>%
filter(recid_dv_char == "Did not reoffend in 3 years"),
aes(xintercept = median(percent_days_employed, na.rm = T),
colour = "Did not reoffend in 3 years"),
lwd = 0.8,
lty = "dotdash",
show.legend = TRUE) +
scale_color_manual(name = "Median",
values = c("Reoffended in 3 years" = "#ff7f00",
"Did not reoffend in 3 years" = "#1f78b4")) +
scale_y_continuous(labels = scales::label_comma()) +
scale_x_continuous(labels = scales::label_percent()) +
facet_wrap(~ recid_dv_char, scales = "free") +
theme_minimal() +
labs(title = "Recidivism by % days employed",
x = "Share of days employed",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
People who are charged with property offenses are more represented in those which reoffended in under 3 years, suggesting type of prison offense plays an important role in recidivism rates.
recid_by_prisonoffense <- ga_recid %>%
group_by(prison_offense, recid_dv_char) %>%
summarize(justice_inv_pop = n()) %>%
filter(prison_offense != "NA")
ggplot() +
geom_col(data = recid_by_prisonoffense,
aes(x = prison_offense, y = justice_inv_pop, fill = prison_offense),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char, scales = "free") +
theme_minimal() +
labs(title = "Recidivism by prison offense",
x = "Prison Offense",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
People who are sentenced to fewer years in prison, specifically fewer than 2 years, are more likely to re-offend in 3 years than those who are given longer prison sentences.
This might be hard to push, but the high occurrence of people with repeated shorter sentences could suggest that those with lesser offences (assuming lesser offenses = shorter prison sentence) should not be sentenced at all. There could be an alternative intervention for these offenders to keep them out of the prison system for repeated short stints.
recid_by_PrisonYrs <- ga_recid %>%
group_by(prison_years, recid_dv_char) %>%
summarize(justice_inv_pop = n())
recid_by_PrisonYrs$prison_years <-
factor(recid_by_PrisonYrs$prison_years,
levels = c("Less than 1 year", "1-2 years",
"Greater than 2 to 3 years", "More than 3 years"))
x_labs_PrisonYrs <- c("Less than 1 year", "1-2 years",
"2-3 years", "More than 3 years")
ggplot() +
geom_col(data = recid_by_PrisonYrs,
aes(x = prison_years, y = justice_inv_pop, fill = prison_years),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_x_discrete(labels = x_labs_PrisonYrs) +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char, scales = "free") +
theme_minimal() +
labs(title = "Recidivism by prison years",
x = "Prison years",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
People who are younger at the time of their release are more likely to re-offend within three years. This suggests there needs to be more assistance to reintegrate younger people back into society after their prison release. Without proper social services and sustained behavioral intervention, younger people may be more inclined to fall back into previous behaviors.
recid_by_Age <- ga_recid %>%
group_by(age_at_release, recid_dv_char) %>%
summarize(justice_inv_pop = n())
ggplot() +
geom_col(data = recid_by_Age,
aes(x = age_at_release, y = justice_inv_pop, fill = age_at_release),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char, scales = "free") +
theme_minimal() +
labs(title = "Recidivism by age at release",
x = "Age at Release",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
Unfortunately, most prisoners do not participate in programming, so it is difficult to determine the effect program attendance has on recidivism. There is some evidence that people who do not attend any programs during their original sentence are more likely to re-offend in three years. This may support the efforts of prison programming to prevent future offenses. Nevertheless, since the metric is statistically significantly associated with -.2 log odds for each program attended, we included it in our report.
recid_by_Program <- ga_recid %>%
group_by(program_attendances, recid_dv_char) %>%
summarize(justice_inv_pop = n())
recid_by_Program$program_attendances <-
factor(recid_by_Program$program_attendances,
levels = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "10 or more"))
ggplot() +
geom_col(data = recid_by_Program,
aes(x = program_attendances, y = justice_inv_pop, fill = program_attendances),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char) +
theme_minimal() +
labs(title = "Recidivism by program attendances",
x = "Program attendances",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
ga_recid <- ga_recid %>%
mutate(prison_years_over3 = case_when(
prison_years == "More than 3 years" ~ 1,
TRUE ~ 0),
programs_over10 = case_when(
program_attendances == "10 or more" ~ 1,
TRUE ~ 0),
programs_none = case_when(
program_attendances == "0" ~ 1,
TRUE ~ 0),
older_than37 = case_when(
age_at_release %in% c("38-42", "43-47", "48 or older") ~ 1,
age_at_release %in% c("18-22", "23-27", "23-27",
"28-32", "33-37") ~ 0),
some_college = case_when(
education_level == "At least some college" ~ 1,
education_level %in% c("High School Diploma",
"Less than HS diploma") ~ 0),
high_supervision = case_when(
supervision_level_first == "High" ~ 1,
TRUE ~ 0),
offense_property = case_when(
prison_offense == "Property" ~ 1,
TRUE ~ 0),
offense_violent_sex = case_when(
prison_offense == "Violent/Sex" ~ 1,
TRUE ~ 0),
one_dependent = case_when(
dependents == "1" ~ 1,
TRUE ~ 0),
)
set.seed(3456)
train_index <- createDataPartition(ga_recid$recidivism_within_3years, p = .50,
list = FALSE,
times = 1)
recid_train <- ga_recid[ train_index,]
recid_test <- ga_recid[-train_index,]
unique(ga_recid$supervision_level_first)
We made three different models, and settled on one with 12 different predictors of recidivism related to individual characteristics (education level, number of dependents), and criminal history (type of offense, number of years in prison). We found that our model predicted more effectively when we:
Every predictor in our final model is significant at a 95% confidence interval. Our model also has an 8.4% decrease between estimates of null deviance and residual deviance, indicating that our variables increase the model’s effectiveness over the intercept alone.
See a table of our model’s predictions below. Our model includes values that both increase and decrease the log odds of recidivism, including percent of days employed (1.164 decrease in log odds of recidivism per percent) and dummy variable for property offense (.317 increase in log odds of recidivism for those who committed property offenses).
Among those most at risk of recidivism include those who have only been in prison a short time, those who committed property offenses, and those who had prior parole violations.
recid_model_3 <- glm(recidivism_within_3years ~ some_college + offense_property +
offense_violent_sex + percent_days_employed +
prison_years_over3 + older_than37 + programs_over10 +
high_supervision + one_dependent +
prior_arrest_episodes_dv_charges + prior_revocations_parole +
prior_revocations_probation,
data = ga_recid,
family="binomial" (link="logit"))
stargazer(recid_model_3, type = "html")
| Dependent variable: | |
| recidivism_within_3years | |
| some_college | -0.396*** |
| (0.035) | |
| offense_property | 0.317*** |
| (0.030) | |
| offense_violent_sex | -0.588*** |
| (0.084) | |
| percent_days_employed | -1.164*** |
| (0.032) | |
| prison_years_over3 | -0.269*** |
| (0.034) | |
| older_than37 | -0.570*** |
| (0.029) | |
| programs_over10 | -0.298*** |
| (0.042) | |
| high_supervision | 0.163*** |
| (0.032) | |
| one_dependent | 0.094*** |
| (0.034) | |
| prior_arrest_episodes_dv_chargestrue | 0.385*** |
| (0.037) | |
| prior_revocations_paroletrue | 0.527*** |
| (0.048) | |
| prior_revocations_probationtrue | 0.154*** |
| (0.039) | |
| Constant | 1.033*** |
| (0.031) | |
| Observations | 25,373 |
| Log Likelihood | -15,743.140 |
| Akaike Inf. Crit. | 31,512.270 |
| Note: | p<0.1; p<0.05; p<0.01 |
Our ROC curve shows that, while our model is not ideal, it still predicts better than a random prediction. It has an area under the curve of .7, which indicates that is a fair, but not good predictor of recidivism. One problem with the curve is that it is a gentle arc, with no clear change in slope that would help us know where to change our predictions for recidivism.
auc(recid_test$outcome, recid_test$probs)
ggplot(recid_test, aes(d = as.numeric(outcome), m = probs)) +
geom_roc(n.cuts = 50, labels = FALSE, colour = "#1f78b4") +
style_roc(theme = theme_grey) +
geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') +
labs(title = "ROC Curve: Recidivism Model") +
theme_minimal() +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)))
recid_test <- recid_test %>%
mutate(outcome = as.factor(recidivism_within_3years),
probs = predict(recid_model_3, recid_test, type= "response"),
pred_outcome = as.factor(case_when(probs >= 0.5 ~ 1,
probs < 0.5 ~ 0)))
cm <- confusionMatrix(recid_test$pred_outcome, recid_test$outcome,
positive = "1")
cm_table <- cm$table
cm_df <- as.data.frame(cm_table)
rownames(cm_df) <- c("True Negative", "False Positive",
"False Negative", "True Positive")
The confusion matrix below highlights our model’s relative sensitivity and specificity. It shows a True Positive value of 6,015, meaning 6,015 entries are correctly identified as positive. In this case, that means our model correctly predicts 6,015 people who experience recidivism within three years. The False Positive value is 2,869, meaning 2,869 people are incorrectly predicted to experience recidivism within three years. The False Negative value of 1,417 represents the number of people who will experience recidivism in the next three years that our model failed to predict. The True Negative value of 2,385 represents the number of people our model correctly predicted to stay out of the prison system within three years.
Our model has an accuracy of just 66%, however the sensitivity, or recall, of our model is 0.81, meaning it correctly identifies 81% of people who will experience recidivism within three years. The specificity of our model is 0.45, meaning the model correctly identifies 45% of people who will not reoffend in three years.
gt(cm_df %>%
rownames_to_column()) %>%
fmt_number(columns = Freq, decimals = 0) %>%
tab_header(title = md("**Confusion Matrix**")) %>%
opt_align_table_header(align = "left") %>%
cols_label(Freq = "Frequency") %>%
gt_theme_nytimes() %>%
opt_row_striping()
| Confusion Matrix | |||
| Prediction | Reference | Frequency | |
|---|---|---|---|
| True Negative | 0 | 0 | 2,385 |
| False Positive | 1 | 0 | 2,869 |
| False Negative | 0 | 1 | 1,417 |
| True Positive | 1 | 1 | 6,015 |
Analysis of the interactions between race and gender suggest that the model underpredicts for white men and women compared to Black men and women. The rate of false positives for Black men is 22.8% vs. 19.7% for white men. Similarly, the false positive rate for Black women is 33.3% vs. 28.8% for white women. Recidivism is overall are less accurately predicted for women.
recid_test <- recid_test %>%
mutate(cm = case_when((outcome == 1) & (pred_outcome == 1) ~ "True Positive",
(outcome == 0) & (pred_outcome == 1) ~ "False Positive",
(outcome == 0) & (pred_outcome == 0) ~ "True Negative",
(outcome == 1) & (pred_outcome == 0) ~ "False Negative"),
gender = case_when(gender == "F" ~ "Women",
gender == "M" ~ "Men"),
race = case_when(race == "BLACK" ~ "Black",
race == "WHITE" ~ "White"))
error_by_demographics <- recid_test %>%
group_by(gender, race, cm) %>%
tally() %>%
mutate(pct_outcome = n / sum(n))
gt(error_by_demographics) %>%
fmt_number(columns = n, decimals = 0) %>%
fmt_percent(columns = pct_outcome, decimals = 1) %>%
cols_label(cm = "Prediction Category",
n = "Count",
pct_outcome = "% Subgroup") %>%
tab_header(title = md("**Prediction disparities by race & gender**")) %>%
opt_align_table_header(align = "left") %>%
tab_style(location = cells_row_groups(),
style = cell_text(weight = "bold")) %>%
gt_theme_nytimes() %>%
opt_row_striping()
| Prediction disparities by race & gender | ||
| Prediction Category | Count | % Subgroup |
|---|---|---|
| Men - Black | ||
| False Negative | 693 | 10.4% |
| False Positive | 1,526 | 22.8% |
| True Negative | 1,060 | 15.8% |
| True Positive | 3,412 | 51.0% |
| Men - White | ||
| False Negative | 576 | 13.0% |
| False Positive | 874 | 19.7% |
| True Negative | 955 | 21.5% |
| True Positive | 2,041 | 45.9% |
| Women - Black | ||
| False Negative | 44 | 8.7% |
| False Positive | 168 | 33.3% |
| True Negative | 115 | 22.8% |
| True Positive | 178 | 35.2% |
| Women - White | ||
| False Negative | 104 | 10.0% |
| False Positive | 301 | 28.8% |
| True Negative | 255 | 24.4% |
| True Positive | 384 | 36.8% |
It is difficult to evaluate the economic impacts of recidivism, as the negative inverse of recidivism (keeping someone in prison), is also a very costly enterprise.
To attempt to estimate the financial implication of our model, we utilize the following assumptions:
Together, these assumptions led us to the following cost estimates:
recid_test <- recid_test %>%
filter(!is.na(pred_outcome))
cost_benefit_table <-
recid_test %>%
count(pred_outcome, outcome) %>%
summarize(True_Negative = sum(n[pred_outcome==0 & outcome==0]),
True_Positive = sum(n[pred_outcome==1 & outcome==1]),
False_Negative = sum(n[pred_outcome==0 & outcome==1]),
False_Positive = sum(n[pred_outcome==1 & outcome==0])) %>%
gather(Variable, Count) %>%
mutate(Revenue =
ifelse(Variable == "True_Negative", Count * 0,
ifelse(Variable == "True_Positive",((.35 - .1) * Count),
ifelse(Variable == "False_Negative", (-0.35) * Count,
ifelse(Variable == "False_Positive", (-0.1) * Count, 0))))) %>%
bind_cols(data.frame(Description = c(
"We correctly predicted no recidivism",
"We correctly predicted recidivism",
"We predicted no recidivism and person reoffended",
"We predicted recidivism and person did not reoffend")))
#True negative (+$300,000)
#Person reintegrated into society, 50k /year on saved prison costs + 50k/year on benefit to society
#True positive (-$100,000)
#Cost of imprisonment per year in PA is $50,000/yr, saved cost of committed crime is $50k
#https://www.vera.org/publications/what-jails-cost-cities/philadelphia-pa
#False Negative(-$90,000)
#Estimated cost of a criminal trial at $15,000 + average of another 1.5 years in jail, $75,000
#Cost of person committing crime $50,000
#False Positive (-$300,000)
#Cost of imprisonment plus 50k / year opportunity cost of person not being in society
alt_cost_benefit_table <-
recid_test %>%
count(pred_outcome, outcome) %>%
summarize(True_Negative = sum(n[pred_outcome==0 & outcome==0]),
True_Positive = sum(n[pred_outcome==1 & outcome==1]),
False_Negative = sum(n[pred_outcome==0 & outcome==1]),
False_Positive = sum(n[pred_outcome==1 & outcome==0])) %>%
gather(Variable, Count) %>%
mutate(Cost =
ifelse(Variable == "True_Negative", Count * 300000,
ifelse(Variable == "True_Positive",((-100000) * Count),
ifelse(Variable == "False_Negative", (-90000) * Count,
ifelse(Variable == "False_Positive", (-300000) * Count, 0))))) %>%
bind_cols(data.frame(Description = c(
"We correctly predicted no recidivism",
"We correctly predicted recidivism",
"We predicted no recidivism and person reoffended",
"We predicted recidivism and person did not reoffend"))) %>%
mutate(Cost = Cost/1000000)
alt_cost_benefit_table <- alt_cost_benefit_table %>%
mutate(Variable = recode(Variable,
"True_Negative" = "True Negative",
"True_Positive" = "True Positive",
"False_Negative" = "False Negative",
"False_Positive" = "False Positive"))
gt(alt_cost_benefit_table) %>%
fmt_number(columns = Count, decimals = 0) %>%
fmt_currency(columns = Cost, decimals = 0) %>%
cols_label(Variable = "Outcome") %>%
tab_header(title = md("**Cost-benefit analysis**")) %>%
opt_align_table_header(align = "left") %>%
cols_label(Cost = "Cost ($Million)") %>%
gt_theme_nytimes() %>%
opt_row_striping()
| Cost-benefit analysis | |||
| Outcome | Count | Cost ($Million) | Description |
|---|---|---|---|
| True Negative | 2,385 | $716 | We correctly predicted no recidivism |
| True Positive | 6,015 | −$602 | We correctly predicted recidivism |
| False Negative | 1,417 | −$128 | We predicted no recidivism and person reoffended |
| False Positive | 2,869 | −$861 | We predicted recidivism and person did not reoffend |
In total, our model shows the government losing $875 million dollars a year. While our model may save as much as $716 million dollars for releasing prisoners early, the real lesson from this exercise is that prison is extremely expensive, and should be avoided whenever possible. However, as rates of recidivism show, prison itself is not a deterrent of further time in prison. Punishment for criminal activity needs to include non-carceral rehabilitation that will save the government money in the long run.
Overall, we determine that social determinants of criminality, including level of education, employment, and race have a significant impact on individuals’ likelihood of recidivism. At the same time, though our model can help the government release prisoners early who have low likelihoods of recidivism, it does nothing to resolve the overarching problem of high recidivism rates.
For further research, we recommend pairing the underlying social and economic causes of the factors that most reliably predict recidivism with appropriate ameliorative programming. These could include:
While prison may be an important tool for managing public safety, the best economic and social tools for improving outcomes in the criminal justice system are keeping people out of prison in the first place, and helping them stay out when they’re released.
While it is important to help people stay out of prison, our model can still be useful in and of itself. Especially given the low accuracy of our model, it should be used as a tool for matching prisoners with support services, rather than as a decider of whether or not someone should be released or kept in prison. For example, if a prisoner has a high log-likelihood of recidivism, instead of keeping them in prison, our model could identify them as a target for specific support services once released.The model can also be refined with future observations of trends in recidivism, with those added support systems.
See graphics below that were not included in our final modeling, but offer extra evidence of the interaction between personal characteristics and recidivism. We also include the distribution of our model’s predictions, based on actual results to show the difficulty of predicting at the margins.
Residence changes does not really suggest housing instability is related to recidivism. Nor is gang affiliation (not listed below).
recid_by_moving <- ga_recid %>%
group_by(residence_changes, recid_dv_char) %>%
summarize(justice_inv_pop = n())
ggplot() +
geom_col(data = recid_by_moving,
aes(x = residence_changes, y = justice_inv_pop, fill = residence_changes),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char) +
theme_minimal() +
labs(title = "Recidivism by number of residence changes",
x = "Number of residence changes",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
Episodes of prior gun charges doesn’t predict recidivism in 3 years either.
recid_by_PriorGun <- ga_recid %>%
group_by(prior_arrest_episodes_gun_charges, recid_dv_char) %>%
summarize(justice_inv_pop = n())
ggplot() +
geom_col(data = recid_by_PriorGun,
aes(x = prior_arrest_episodes_gun_charges, y = justice_inv_pop, fill = prior_arrest_episodes_gun_charges),
position = "dodge",
show.legend = FALSE) +
scale_fill_brewer(type = "qual",
palette = "Paired") +
scale_y_continuous(labels = scales::label_comma()) +
facet_wrap(~ recid_dv_char) +
theme_minimal() +
labs(title = "Recidivism by episodes of prior gun charges",
x = "Episodes of prior gun charges",
y = "Justice-involved pop.") +
theme(plot.title = element_text(face = "bold", margin = margin(b = 6)),
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1))
palette2 <- c("#ff7f00","#1f78b4")
ggplot(subset(recid_test, pred_outcome != "NA"), aes(x = probs, fill = as.factor(outcome))) +
geom_density(alpha = .75,
color = NA) +
scale_fill_manual(name = "Outcome",
labels = c("Did not reoffend (0)", "Reoffended (1)"),
values = palette2) +
labs(x = "Recidivism", y = "Density of probabilities",
title = "Distribution of predicted probabilities by observed outcome") +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold", margin = margin(b = 6)),
strip.text.x = element_text(size = 18))